Efficient Exploration and Value Function Generalization in Deterministic Systems

نویسندگان

  • Zheng Wen
  • Benjamin Van Roy
چکیده

We consider the problem of reinforcement learning over episodes of a finitehorizon deterministic system and as a solution propose optimistic constraint propagation (OCP), an algorithm designed to synthesize efficient exploration and value function generalization. We establish that when the true value function Q⇤ lies within the hypothesis class Q, OCP selects optimal actions over all but at most dimE[Q] episodes, where dimE denotes the eluder dimension. We establish further efficiency and asymptotic performance guarantees that apply even if Q⇤ does not lie in Q, for the special case where Q is the span of pre-specified indicator functions over disjoint sets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing Geostatistical Seismic Inversion Based on Spectral Simulation with Deterministic Inversion: A Case Study

Seismic inversion is a method that extracts acoustic impedance data from the seismic traces. Source wavelets are band-limited, and thus seismic traces do not contain low and high frequency information. Therefore, there is a serious problem when the deterministic seismic inversion is applied to real data and the result of deterministic inversion is smooth. Low frequency component is obtained fro...

متن کامل

Generalization and Exploration via Randomized Value Functions

We propose randomized least-squares value iteration (RLSVI) – a new reinforcement learning algorithm designed to explore and generalize efficiently via linearly parameterized value functions. We explain why versions of least-squares value iteration that use Boltzmann or -greedy exploration can be highly inefficient, and we present computational results that demonstrate dramatic efficiency gains...

متن کامل

A Novel Combinatorial Approach to Discrete Fracture Network Modeling in Heterogeneous Media

Fractured reservoirs contain about 85 and 90 percent of oil and gas resources respectively in Iran. A comprehensive study and investigation of fractures as the main factor affecting fluid flow or perhaps barrier seems necessary for reservoir development studies. High degrees of heterogeneity and sparseness of data have incapacitated conventional deterministic methods in fracture network modelin...

متن کامل

Gaussian Processes for Sample Efficient Reinforcement Learning with RMAX-Like Exploration

We present an implementation of model-based online reinforcement learning (RL) for continuous domains with deterministic transitions that is specifically designed to achieve low sample complexity. To achieve low sample complexity, since the environment is unknown, an agent must intelligently balance exploration and exploitation, and must be able to rapidly generalize from observations. While in...

متن کامل

Adaptive-Resolution Reinforcement Learning with Efficient Exploration in Deterministic Domains∗

We propose a model-based learning algorithm, the Adaptive-resolution Reinforcement Learning (ARL) algorithm, that aims to solve the online, continuous state space reinforcement learning problem in a deterministic domain. Our goal is to combine adaptive-resolution approximation scheme with efficient exploration in order to obtain fast (polynomial) learning rates. The proposed algorithm uses an a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013